Using a Multidimensionality-Based Framework to Identify and Interpret the Construct-Related Dimensions that Elicit Group Differences
نویسنده
چکیده
The Roussos-Stout (1996) multidimensionality-based DIF analysis framework emphasizes a substantively-informed selection of items for both the matching and studied subtest based on the dimensions suspected of underlying the test data. By contrast, standard DIF practice encourages an exploratory search for matching subtest items based on purely statistical criteria, such as a failure to display DIF. Using two examples, we demonstrate that these two approaches lead to different interpretations about the occurrence of DIF in a test. It is argued that selecting a matching and studied subtest, as identified using the multidimensional framework, can lead to a more informed understanding of why DIF occurs. Multidimensional Framework 3 Using a Multidimensionality-Based Framework to Identify and Interpret the Construct-Related Dimensions that Elicit Group Differences According to the authors of the Standards for Educational and Psychological Testing (1999), bias occurs when tests yield scores or promote score interpretations that result in different meanings for members of different groups (e.g., race, ethnicity, language, culture, gender, disability, or socio-economic status). Bias is often attributed to construct-irrelevant dimensions that differentially affect the test scores for different groups of examinees. Group differences can also be attributed to item impact. Impact occurs when construct-relevant dimensions differentially affect the tests scores for different groups of examinees. In this case, the item is a relevant measure of the target construct and the difference between the groups reflects a true difference on that construct. Differential item functioning (DIF) studies are designed to identify and interpret these construct-related dimensions using a combination of statistical and substantive analyses. The statistical analysis involves administering the test, matching members of the reference and focal group on a measure of ability derived from that test, and using statistical procedures to identify group differences on test items. An item exhibits DIF when examinees from the reference and focal groups differ, on average, in their probabilities of answering that item correctly, after controlling for ability. The substantive analysis builds on the statistical analysis because DIF items are often scrutinized by expert reviewers (e.g., test developers or content specialists) who attempt to identify the construct-related dimensions that produce group differences. A DIF item is considered biased when reviewers identify some dimension, deemed to be irrelevant to the construct measured by the test, that places one group of examinees at a disadvantage. Conversely, a DIF item displays impact when the dimension that differentiates the groups is judged to be relevant to the construct measured by the test. Considerable progress has been made in the development and refinement of statistical methods for identifying items showing DIF (see reviews by Clauser & Mazor, 1998; Millsap, & Everson, 1993) but the development and refinement of substantive methods designed to aid with the interpretation of these items have lagged far behind (e.g., Bond, 1993; Camilli & Shepard, 1994, Englehard, Hansche, & Rutledge, 1990; Gierl, Bisanz, Bisanz, Boughton, & Khaliq, 2001; Gierl, Rogers, & Klinger, 1999, O’Neill & McPeek, 1993; Plake, 1980; Roussos & Stout, 1996; Standards for Educational and Psychological Testing, 1999; Stout, 2002; Sudweeks & Tolman, Multidimensional Framework 4 1993). The traditional approach—subjecting items flagged with DIF analyses to the scrutiny of reviewers—has not been successful because the interpretations tend to be inconsistent with the DIF statistics or unreliable among reviewers. For example, Camilli and Shepard (1994) reported that, in their experience, as many as half of the items with “large” DIF in any one study might not be interpretable. Angoff (1993) noted: "It has been reported by test developers that they are often confronted by DIF results that they cannot understand; and no amount of deliberation seems to help explain why some perfectly reasonable items have large DIF values" (p. 19). Roussos and Stout (1996) reviewed the DIF literature and claimed, “attempts at understanding the underlying causes of DIF using substantive analyses of statistically identified DIF items have, with few exceptions, met with overwhelming failure” (p. 360). The authors of the Standards for Educational and Psychological Testing (1999) concluded: Although DIF procedures may hold some promise for improving test quality, there has been little progress in identifying the causes or substantive themes that characterize items exhibiting DIF. That is, once items on a test have been statistically identified as functioning differently from one examinee group to another, it has been difficult to specify the reasons for the differential performance or to identify a common deficiency among the identified items. (p. 78) This impasse represents a fundamental problem in the study of group differences using DIF methods. Roussos and Stout (1996) proposed a multidimensionality-based DIF analysis paradigm to bridge the gap between statistical and substantive analyses by linking both to the Shealy-Stout multidimensional model for DIF (Shealy & Stout, 1993). The first stage is a substantive analysis where DIF hypotheses are generated. The second stage is a statistical analysis where the DIF hypotheses are tested. By combining statistical and substantive analyses in a multidimensional framework, researchers and practitioners can begin to systematically identify and interpret the construct-related dimensions that produce group differences using DIF methods. The purpose of this paper is twofold: In the first section, we describe the Roussos and Stout (1996) DIF analysis framework. In the second section we illustrate, using two examples, how the DIF analysis framework can lead to more interpretable results about the dimensions that produce Multidimensional Framework 5 group differences when compared with the traditional approach to DIF detection. We conclude with a summary and we highlight some implications for practice. DIF Analysis Framework: An Overview Roussos and Stout (1996) proposed a multidimensionality-based DIF analysis framework to link substantive and statistical analyses so researchers and practitioners can begin to systematically identify and study the sources of DIF. The DIF analysis framework is rooted in the Shealy and Stout (1993) multidimensional model for DIF (MMD), which serves as a theoretical basis for understanding how DIF occurs. A dimension is a substantive characteristic of an item that can affect the probability of a correct response. The main construct the test is intended to measure is the primary or target dimension. The MMD is based on two assumptions: (a) DIF items elicit at least one secondary dimension in addition to the primary dimension and (b) a difference exists between the two groups of interest in their conditional distributions on the secondary dimension, given a fixed value on the primary dimension. Roussos and Stout (1996) interpreted the secondary dimensions further. The secondary dimensions are auxiliary if they are intentionally assessed as part of the construct on the test, which implies the construct of interest contains multiple dimensions. DIF caused by auxiliary dimensions is benign (reflecting impact). Alternatively, the secondary dimensions are nuisance if they are unintentionally assessed as part of the construct on the test. DIF caused by nuisance dimensions is adverse (reflecting bias). On a test of mathematics achievement, for example, knowledge of mathematics might be a primary dimension, critical thinking might be an auxiliary secondary dimension, and testwiseness (i.e., using strategies to select the correct answer based on knowledge of test item characteristics) might be a nuisance secondary dimension. If a DIF item favors females and this difference can be attributed to the critical thinking auxiliary secondary dimension, when considered in isolation from the mathematics primary dimension, then DIF is considered benign. Alternatively, if a DIF item favors males and this difference can be attributed to the testwiseness nuisance dimension, when considered in isolation from the mathematics primary dimension, then DIF is considered adverse. The Roussos-Stout DIF analysis (1996) framework is a two-stage procedure built on the foundation provided by the MMD. The first stage is a substantive analysis where the dimensional Multidimensional Framework 6 structure of the test is evaluated and, based on this structure, where DIF hypotheses are generated. To decide whether the data contain distinct dimensions, organizing principles are used to identify single items or bundles of items that share certain characteristics. Four different organizing principles have been used to identify dimensions on tests (Ackerman et al., 2003; Douglas et al., 1996; Gierl et al., 2001; Roussos & Stout, 1996). First, test specifications can guide the assessment of dimensionality. Test specifications outline the achievement domain and help test developers obtain a representative sample of items from this domain. The specifications also guide item writing and help structure the final form of the test based on the content and cognitive domain that the test is designed to measure. Thus, a thorough analysis of the content areas measured by the test and the cognitive skills required by the examinees to solve the items may help identify a subsets of items that measure distinct dimensions associated with these content areas and cognitive skills (e.g., Gierl et al., 2001; Oshima, Raju, Flowers, & Slinde, 1996). Second, a content analysis can guide the assessment of dimensionality. For example, content specialists can review items and identify dimensions based on specific item content. A content analysis is guided by the professional experience of the reviewers. Two variations of content review can be used: specialists may use their experience and judgment to identify dimensions during an item review (e.g., Bolt et al., 1996; Douglas et al., 1996) or content-based judgments can be found in the literature to guide interpretation using well-known tests (e.g., Gierl & Bolt, 2003). Third, psychological analyses can guide dimensionality assessment when the hypothesized item structure is formulated from a psychological perspective. For example, a cognitive task analysis could be used to identify skills that characterize mathematics performance (e.g., Gallagher, De Lisi, Holst, McGillcuddy-De Lisi, Morely, & Cahalan, 2000). These cognitive skills could be identified and operationalized using test items to inform a dimensionality assessment (e.g., Gierl et al., in press). Fourth, empirical analyses can guide dimensionality assessment by using statistical methods to facilitate the identification of dimensions. Empirical approaches include, but are not limited to, factor analysis, cluster analysis, latent class analysis, and multidimensional scaling. The outcomes from these empirical approaches are then interpreted. This approach is substantive to the extent that the dimensions identified with the empirical procedures are, in fact, interpretable (cf., Douglas, Kim, Roussos, Stout, & Zhang, Multidimensional Framework 7 1999; Kupermintz, Ennis, Hamilton, Talbert, & Snow, 1995; Hamilton, Nussbaum, Kupermintz, Kerkhoven, & Snow, 1995). Once the dimensions are identified, they must be and interpreted as either the primary or secondary dimensions and, further, the secondary dimensions must be distinguished as auxiliary or nuisance. Then, the DIF hypotheses can be formulated to guide the study of group differences. The DIF hypotheses specify whether a single item or bundle of items designed to measure the primary dimension also measures a secondary dimension, thereby producing DIF across specific groups of examinees. DIF attributed to auxiliary secondary dimensions is benign whereas DIF attributed to nuisance secondary dimensions is adverse. The second stage in the Roussos-Stout DIF analysis framework is statistically testing the dimensionality-based DIF hypotheses. The statistical analyses are used to see whether the organizing principles reveal distinct primary and secondary dimensions across the groups under study. SIBTEST is used to test DIF hypotheses and quantify the size of DIF (Stout & Roussos, 1995). To operationalize SIBTEST, items on the standardized test are divided into the matching and studied subtest based on the dimensions identified in the substantive analysis. The matching subtest contains items believed to measure only the primary dimension. This subtest should be an accurate measure of a unidimensional matching criterion because examinees in each subgroups are placed at the same score level so their performance on items from the studied subtest can be compared. Alternatively, the studied subtest contains items suspected of measuring the primary and secondary dimensions. In other words, the accuracy and interpretability of the statistical outcomes in second stage depend, in part, on the accuracy and interpretability of the substantive dimensionality analyses from the first stage. SIBTEST then uses differences in the expected scores conditional on primary dimension across groups to test for DIF. The method can be applied using either dichotomously-scored items (Shealy & Stout, 1993) or polytomously-scored items (Chang, Mazzeo, & Roussos, 1995). Since the approach under both item scoring conditions is basically the same, the more general case, as it applies to polytomous items, is described. As a first step, SIBTEST estimates ( ) R ES θ and ( ) F ES θ , the expected score for a studied item conditional on the primary dimension θ for the reference and focal group, respectively. Multidimensional Framework 8 However, in place of θ , SIBTEST uses total scores for a matching subtest of items. Then the expected item scores are estimated as
منابع مشابه
Implications of the Multidimensionality-Based DIF Analysis Framework for Selecting a Matching and Studied Subtest
In this paper we describe and illustrate the Roussos-Stout (1996) multidimensionality-based DIF analysis framework, with emphasis on its implication for the selection of a matching and studied subtest for DIF analyses. Standard DIF practice encourages an exploratory search for matching subtest items based on purely statistical criteria, such as a failure to display DIF. By contrast, the multidi...
متن کاملResearch-Based Teaching and Learning within a Constructivist Framework: Designing a Phenomenologically-Based Model
To explain the process of research-based methods of teaching and learning based on the professional knowledge of educational experts, and identify the benchmarks in using such a method, 25 academics at Farhangiyaan University and Ministry of Education were interviewed. The data were used to answer two basic questions: What characterizes research-based teaching and learning, and based on...
متن کاملUsing the Multidimensionality-Based DIF Analysis Paradigm to Study Cognitive Skills that Elicit Group Differences: A Critique
described a three-step approach used by researchers and practitioners attempting to identify biased test items: First, statistical methods are used to find items for which there are unexpected differences in performance between two groups (e.g., men and women). Second, each potentially biased item is examined for the reasons it is relatively more difficult for a particular group of examinees. T...
متن کاملEvaluating EFL Learners’ Philosophical Mentality through their Answers to Philosophical Questions: Using Smith’s Framework
Given the role philosophical mentality can fulfill in bringing individuals the essential skills of wisdom and well thinking, the present paper, by applying Smith’s (2007) theoretical framework, strived to explore the extent philosophic-mindedness exists among the participants. Considering the fact that, a philosophic mind begets philosophical answers, the participants’ philosophical thi...
متن کاملIdentification of the most important factors of ethnic differences in anthropometric dimensions of Iranian workers using the decision tree
Background and aims: Anthropometry is the branch of human science that considers the physical measurement of the human body, especially size and shape. One application of anthropometrical data in ergonomics is the design of working space and the development of industrialized products. So that the tools, equipment and workstations, which designed based on the physical dimensions of the workers, ...
متن کامل